Robust Cross-lingual Hypernymy Detection using Dependency Context
نویسندگان
چکیده
Cross-lingual Hypernymy Detection involves determining if a word in one language (“fruit”) is a hypernym of a word in another language (“pomme” i.e. apple in French). The ability to detect hypernymy cross-lingually can aid in solving cross-lingual versions of tasks such as textual entailment and event coreference. We propose BISPARSE-DEP, a family of unsupervised approaches for cross-lingual hypernymy detection, which learns sparse, bilingual word embeddings based on dependency contexts. We show that BISPARSE-DEP can significantly improve performance on this task, compared to approaches based only on lexical context. Our approach is also robust, showing promise for low-resource settings: our dependency-based embeddings can be learned using a parser trained on related languages, with negligible loss in performance. We also crowd-source a challenging dataset for this task on four languages – Russian, French, Arabic and Chinese. Our embeddings and datasets are publicly available.1
منابع مشابه
Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection
The fundamental role of hypernymy in NLP has motivated the development of many methods for the automatic identification of this relation, most of which rely on word distribution. We investigate an extensive number of such unsupervised measures, using several distributional semantic models that differ by context type and feature weighting. We analyze the performance of the different methods base...
متن کاملCross-lingual Dependency Transfer : What Matters? Assessing the Impact of Pre- and Post-processing
In this paper, we propose to analyze the preand post-processing steps applied in the context of cross-lingual dependency transfer. To this aim, we employ a simple transfer strategy that operates on partially annotated projected data. We show that a good data selection strategy is a key point in successfully transferring dependencies and that better data selection techniques need to be developed...
متن کاملAutomatic Cross-Lingual Similarization of Dependency Grammars for Tree-based Machine Translation
Structural isomorphism between languages benefits the performance of cross-lingual applications. We propose an automatic algorithm for cross-lingual similarization of dependency grammars, which automatically learns grammars with high cross-lingual similarity. The algorithm similarizes the annotation styles of the dependency grammars for two languages in the level of classification decisions, an...
متن کاملEnglish-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کاملCross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study
We present a study of cross-lingual direct transfer parsing for the Irish language. Firstly we discuss mapping of the annotation scheme of the Irish Dependency Treebank to a universal dependency scheme. We explain our dependency label mapping choices and the structural changes required in the Irish Dependency Treebank. We then experiment with the universally annotated treebanks of ten languages...
متن کامل